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ABSTRACT: Evaluatioa of technical and scientific 
translations dealing with complex subject matter has 
shown that a) the majority of errors made by the 
translators involve terminology, and b) the translators 
spend a great deal of their tine searching for the 
correct equivaleats of technical-scientific terms to be 
translated. This paper describes a technique of 
generating termiiaological digests speedily on terminals 
connected to a couputer in order to overcome these 
impediments and aid the translator in streamlining the 


translation production process. A terminological 
digest represents the glossarial framework of a 
translation, a anique dictionary constructed 


automatically for the text to be translated. The user 
can produce a tarminological digest by invoking the 
appropriate program on his terminal, entering on the 
keyboard the terms he wishes to have looked up. All 
terms entered are immediately retrieved from an 
up-to-date scientifie-technical dictionary and provided 
with target languag2 equivalents and other pertinent 
information, At the user's option, the dictionary 


entries may be preseated singly, as a list in the order 


of entering the terms (e.g., the order in which they 
occur in the text to be translated), or as an 
alphabetically-sorteli list. These lists may be 
displayed, typed out, or printed and saved as 
"minidictionaries" for a particular field. , 
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THE PROLIFERATION OF SPECIALIZED TERMINOLOGY 


As has been emphasized in a variety of works on 
translation ({1,2,3,4,5,6], mere knowledge of the general 
vocabulary and the grammars of two or more languages do not 
necessarily enable an educated person to translate 
scientific, economic, technical oc legal literature. To 
translate specialized material of this type correctly, an 
acquaintance with the subject matter and a command of the 
proper terminology is required. For the professional 
translator, one of the most exasperating aspects of 
translating modern texts on a variety of subjects is the 
inordinate amount of searching and learning time he has to 
expend to avaluate special terms employed by the authors of 
source language documents. It has been claimed that up to 
60% of a conscientious translator's work time is consumed in 
tracking down proper terminological information [7]. 


Not only is the volume of new terminology increasing 
capidly, but also the looseness of using the technical 
vocabulary is growing among various specialists in the field 
[8]. Added to the proliferation of unabbreviated specialized 
terminology must be the ever-enlarging usage of acronyms and 
initialisms, particularly in information processing, which 
frequently cannot be decoded even by experts in a _ given 
field without the aid of a proper dictionary [9]. The 
number of acronyms formally collected in the U.S. has now 
swelled to nearly 103,000 teras [10] and is continuing to 
grow [{11,12,13,14]. However, even the most up-to-date 
printed dictionary cannot maintain a rate of speed parallel 
with the burgeoning growth of specialized terminology [15]. 
That dictionaries are obsolete is especially evident in 
computer systems terminology [16], which is proliferating at 
a cate which could perhaps be compared to the proliferation 
of higher-level languages in the programming field. 


It is, of course, true that as the use of higher-level 
languages can effectively assist non-data- processing 
professionals in communicating with a computer in their own 
professional jargon [17], so special terminology can make 
communication for a scientist oc technologist more efficient 
and convenient when interacting with fellow specialists 
(18]. And just as the development of programming languages 
is likely to go forward [19], one would expect a continual 
growth of new terminology, given the rate at which 
technological concepts are developed. In fact, Jean Sammet's 
statement concerning one of the major reasons for the 
proliferation of programming languages can be applied 
virtually unchanged to the terminological explosion 
[17: 310}: “Some of the causes and motivations behind the 
development of these languages rest in quirks of human 
nature rather than technological progress or lack thereof. 
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Thus, as long as people find it fun to develop languages, as 
long as they want something which is specifically tailored 
exactly to their needs, ani as long as_ they are going to 
find picayune faults with the existing languages, there is 
very little that technical progress can do to reduce the 
number of languages".* 


Thus, although the translator may be familiar with the 
scientific or technical subject matter and its fundamental 
terminology, the number of terms which take on different 
shades of meanings, cover new concepts when used by various 
authors, or are outright neolojyisms, may tend to confound 
the translator in his attempts to conscientiously determine 
the correct translation [20,21,22]. Compounding this 
confusion is the fact that at the rate at which new terms 
are coined to communicate technological development, even 
the best technical and scientific bilingual or multilingual 
dictionaries are out of date by the time they appear in 
print [ 15,22, 23,24,25,26]. 


IMPROVING THE ACCESS TD SPECIALIZED TERMINOLOsY 


oe we ww wwe ew ew ewow ewe wowoeworwvreewe en cn eeeeecewew ew 2 oe wo 


To overcome the impediments in scientific and technical 
translation arising from the dispersion and inaccessibility 
of terminology, a variety of suggestions have been made 
{7,8,27,28,29,30,31,32] which may b2 summarized as follows: 


1. Constructive efforts should be made by closer 
coordination between the terninology boards of the 
various disciplines, technical associations and 


laboratory information services to standardize jargon 
appearing in printed fora. 


2. A coherent approach should b2 established to ensure that 
all meanings of a_ scientific/technical term are recorded 
in the dictionary with reference to the fieli to which 
the meaning applies. 


3. The rate at which special teras are to be added to 
existing monolingual glossaries and bilingual or 
multilingual dictionaries should be increased, eventually 
contributing to a reduction in the translator's overall 
search time for specialized terminology. 


* Of the large number of synonymous computer technology 
terms, only one example is given for "the function to 
combine object modules to produce a single progran," as used 
by the data processing community: linkage editor, link 
editor, builder, winder, loader, liaking loader, relocatable 
loader, linkage loader, Linking (relocatable) loader, 
collector, job loader [ 16]. 
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4. The latest dictionaries and reference manuals should be 
at the translator's immediate disposal so that the proper 
terms can be looked up speedily and conveniently. 


- 5. The translator should be able to rapidly obtain a 


terminological digest of a text to be translated, i.e., a 
list of searched-for terms extracted from the dictionary 
in order of textual occurrence and/or alphabetical 
occurrence. 


This paper does not address itself to objectives 1 and 
2, viz. closer coordination between the parties to 
standardize jargon and to ensure that all meanings of 
technical terms are documentel. Such endeavors are the 
domain of the technical experts and technical writers. 
Obviously, the mere availability of computers does not 
constitute a remedy for controlling the jargon explosion or 
for solving human communications problems. However, the 
Opportunity to interact with computers on-line in a 
terminal-oriented environment does provide the potential for 
finding effective solutions to objectives 3 through 5 
without undue emphasis on the willingness of the 
technologists to document and standardize their specialized 
terminology expeditiously. 


Approaches to meeting objectives 3 and 4 have been 
described in detail elsewhere [33,34,35,36,37]. In summary, 
objective 3 can be met by giving the user access to a 
time-shared computer system supporting data bases and 
dictionary maintenance prograns to allow bilingual or 
multilingual dictionary generation and updating. Objective 4 
involves dictionary lookup and browse programs for 
presenting dictionary entries in hardcopy or video format on 
appropriate terminal devices or on high-speed printers 
capable of producing high quality copy. concomitant 
requirements would entail (1) context editing, making it 
easy both to change stored translation texts and 
dictionaries and to input new ones, and (2) formatting, to 
produce a variety of professional-looking translation and 
dictionary layouts. 


The approach which is described on the following pages 
involves objective 5 and is oriented toward reducing the 
manual searching and sifting time which the translator 
reguires to determine the proper terminology in a 
translation, and thereby toward increasing his productivity. 
The approach deals with the semi-automatic generation of 
terminological digests of texts to be translated, a method 
whose basic ideas can be traced back to the work on 
text-related glossaries by the West German Buniessprachenamt 
[38]. 
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The Bundessprachenamt (Dbersetzerdienst der Bundeswehr) 
analyzed the types of errors in technical-scientific 
translations and found, among other things, that the rate of 
mistranslated and untranslated specialized terminology 
increased proportionately to the increase in the technical 
difficulty level of source texts. On the greatest difficulty 
level, terminological errors accounted for 62.1% of the 
translation error total (where the range of error types 
included such categories as orthographical mistakes, 
punctuation, wrong German preposition, inflectional errors, 
English word order, text inaccuracies and omission of 
information). Concurrent time and productivity studies were 
conducted indicating that translators using dictionary lists 
with special source/target language terms which were 
exclusively related to the technical-scientific texts to be 
translated could reduce the error rate by approximately 40% 
and increase their productivity by over 50% as compared to 
their colleagues who had well-equipped conventional 
technical-scientific libraries and the consultation of their 
colleagues at their disposal [ 39]. 


GENERATING TERMINOLOGICAL DIGESTS 


———“—c<e eee wow enna ww wreawoenanwneeewewewo ws 


As indicated above, a terminological digest is a list 
of terms extracted from a main dictionary in the order of 
textual occurrence or alphabetical occurrence, i.e., the 
desired terminological framework of a text to be translated. 
The text to be processed may be of arbitrary length. Figure 
1 represents an example of a text portion to be translated. 
Figure 2 shows the automatically-produced terminological 
digest of this text in the order in which the desired terms 
occur, and Figure 3 shows the terminological digest in 
alphabetic order. 


The user has at his disposal a terminal, either a 
typewriter or a video display unit (Figure 4), which may be 
connected over regular telephon2 lines to a computer. After 
having turned on the terminal and, if required, dialed up 
the computer and made the connection, production of a 
terminological digest is achieved by first invoking the 
appropriate program and then entering the terms one wishes 
to have looked up at the keyboard. 


If the source text is stored in machine-readable format 
in the system, as is the case for text produced by a variety 
of text-processing systems, it may also be displayed and 
examined by rolling it up or down ("Scrolling") on the 
surface of a display screen. However, depending on the 
translation job, material to be translated may only be 
obtainable in non-machine-processable format, in which case 
it cannot be stored inthe machine. Moreover, even though 
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source text may be available in machine processable format, 
it may not be accessible on a particular computer system 
with dictionary files, because of lack of storage space or 
because of computer installation policy. In fact, 
secretarial copy aids for holding manuscripts may make 
manual page flipping competitive with, and perhaps even more 
economical than, scrolling of machine-readable source text. 


Temporary lack of a terminal, or assignment 
considerations between translators and typists, may call for 
separation of the task of identifying special terms for 
terminological digest production and of entering these terms 
in the systen. For example, the translator may wish to 
encircle the desired terminology in the source text and 
submit it to a typist who may input these terms on-line or 
off-line, by terminal or typewriter (e.g., an MC/ST or 
MT/ST). 


All terms entered are immediately looked up by the 
program in a comprehensive dictionary, which may be oriented 
toward a special subject area, and provided with target 
languaje equivalents and other germane information. At the 
user's option, the terms may be presented singly (i.e., 
immediate display of a term in dictionary context as soon as 
the term is entered or selected), as a list in th2 order of 
entering (normally the order in which they occur in the text 
to be translated), or as an alphabetically-sorted list. The 
terminological digests May also be saved and used 
repetitively, automatically edited as any other translation 
document [33], directed to offline output devices for 
high-speed printing or punching, or transmitted by 
telecommunications to other terminals or computers. 


The dictionary storage organization is designed to cope 
with the growth potential of multilingual dictionaries, 
where an extremely large number of entries may eventually be 
accumulated. Details of this organization and of the 
associated dictionary lookup procedure are described 
elsewhere [33]. During execution 9£ a single dictionary 
lookup, the desired source language term is compared with a 
table of source language terms whose corresponding 
dictionary entries occur at fixed intervals in the 
dictionary file. Dependent upon a high/low/equal compare 
result, appropriate routines are called to move the desired 
entry into a storage buffer for a video display unit or a 
typewriter terminal. Unless the buffer is completely 
occupied, adjacent entries are retrieved and packed into it 
until it is filled. This dictionary excerpt is then flashed 
onto the display screen. If a video display unit is not 
used, the entry is printed out on the typewriter. 


For a terminological digest consisting of a list of 
entries, the same dictionary lookup procedure is used except 
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that the system delays the search until all terms to be 
looked up have been entered. At this point the lookup 
procedure is applied iteratively to each term of the entered 
collection in order to obtain the corresponding dictionary 
entry. When this process has been completed, the resulting 
entries are displayed, typed out or printed as a group. 


Depending upon the translation task or the size of the 
digest, the user can cause immediate digest display, 
type-out on his terminal, or printout on a high-speed 
printer, or he can save it as a file for future translation 
work and possible additional terminological and statistical 
investigations. At his option, he may also cause the system 
to generate an alphabetically-sorted digest.* This could be 
especially helpful if more than ons translator works on one 
large text, each concentrating on sections for which 
terminological digests in taxt order and a global 
alphahbetically-sorted digest e2ncompassing the entire text 
(Figure 5) are generated. Although different sections of the 
sane text are translated by different translators, 
consistency of terminology is maintained by usage of the 
machine dictionary, ensuring that the same terms are always 
translated in the same way [40]. A specialist in an 
editorial function may decide that some editing of the 
terminological digest is required before it is used by the 
individual translators. For example, a manual on unit 
record machines in electronic data processing might contain, 
among other things, the terms “card stacker" (English/German 
dictionary equivalents: Kartenablage, Ablagefach), "vertical 
line" (English/German dictionary equivalents: Senkrechte, 
Vertikale), and "level" (English/German dictionary 
equivalents: Ebene, Ordnung, Stufe, Pegel, Niveau). All of 
those translations may be valid within the same text. 
However, the specialist may conclude 

a) that "Ablagefach" must be edited out of the 
terminological digest to maintain uniformity in equipment 
terminology, 

b) that although either "Senkrechte" or "Vertikale" 
could be edited out, they may b2 used interchangeably by the 
translators because of their complete unambiguity in German, 
and 

c) that "Ebene, Ordnung, Stufe, Pegel, Niveau" must be 


left untouched, since each translation may have to be used 
even within a small stretch of text, so that the choice must 
therefore be left to the discretion of the translator. By 
using the automatic editing facility [33], changes to the 


terminological digests can be made extremely rapidly. 


* see Appendix for memory layout and sort mechanism in 
terminological digest generation. 
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KEY _IN STORAGE 


For purposes of protection and recording of 
references and changes, main storage is 
divided into blocks of 2,048 bytes, each 
block having an address that is a multiple 
of 2,048. A control field, called "key in 
storage", is associated with each block of 
storage. 


The key in storage has the following: 


format: 


| eee teak cee ane | 
jacc. {F|R{C| 
———— 4d 


0 4 6 


The bit positions in the key are allo- 
cated as follows: 


Access-Control Bits (ACC): Bits 0-3 are 
matched against the four-bit protection key 
whenever information is stored, or whenever 
information is fetched from a location that 
is protected against fetching. 


Fetch-Protection Bit (F): Bit 4 controls 
whether protection applies to fetch-type 
references: a zero indicates that only 
store-type references are monitored and 
that fetching with any protection key is 
permitted; a one indicates that protection 
applies to both fetching and storing. No 
distinction is made between the fetching of 
instructions and of operands. 


Reference Bit (R): Bit 5 normally is set 
to one each time a location in the corres- 
ponding storage block is referred to either 
for storing or for fetching of information. 
This bit is associated with dynamic address 
translation. 


Change Bit (C): Bit 6 is set to one each 
time information is stored into the corres- 
ponding storage block. This bit is asso- 
ciated with dynamic address translation. 


The key in storage is not part of 
addressable storage. The program can 
explictly place information in all seven 
bits of the key by SET STORAGE KEY, and the 
contents of the key can be inspected by 
INSERT STORAGE KEY. Additionally, the 
instruction RESET REFERENCE BIT provides a 
means of inspecting the reference and 
change bits and of setting the reference 
bit to zero. 


PROTECTION 


The protection facility is provided to 
protect the contents of main storage from 
destruction or misuse caused by. erroneous 
or unauthorized storing or fetching by the 


program. It provides protection against 
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Figure 1: Example of a text portion to ke translated. 
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key in storage: Speicherschluessel [5/370] {SYST} 
protection: Schutz, Protektion, Beschuetzung; Schutzzoll {LEG} 


reference: Hinweis, Bezugnahme Nachschlagen; mit Verweisungen versehen; 
durch Verweisungen finden 


main storage: Hauptspeicher [SYST] 


‘block: Block [SYST]; Satzblock [SYST]; blocken [SYST]; blockieren: 
in Bloecke formen 


address: Adresse; Anspreche; adressieren; anreden 

multiple: Vielfaches; vielfach 

control field: Kontrollfeld [$/370]; Sortierfeid [SYST] 

key in storage: Speicherschluessel {5/370} [SYST] 

storage: Speicher [SYST]; Speicherung; Lagerung; Lagermiete [COM] 


bit: Bit {SYST], Binaersiffer [SYST]; Bohrspitze (MECH); Schluesselbart; 
kleines Stueckchen 


allocate: zuordnen, 2uteilen, anweisen 
access~contrel bit: Zugriffs-Steuerungsbit [S/370] 


match: abgleichen [SYST]; verbinden, paaren, paarweise verbinden; 
passend verbinden [MECH]: Gleiche(r, s}, Zusammenbringen 


protection key: Schutzschluessel {S/370] [SYST] 

fetch: Abruf [SYST}; abrufen [SYST]; abholen 

fetch-protection bit: Abrufsachutzbit [S/370} {SYST} 

apply: zutreffen; anwenden, verwenden; auftragen [MECH]; bewerben 


fetch-type 
*** PEHLT IM WOERTERBUCH (IST ABFRAGE PALSCH BUCHSTABIERT?) *** 


indicate: anzeigen, angeben, andeuten 


store~type 
*e* PEHLT IM WOERTERBUCH (IST ABFRAGE PALSCH BUCHSTABIERT?) *** 


fetch: Abruf [SYST}; ahbrufen [8YST]; abholen 


store: speichern [SYST]; lagern, aufspeichern; Speicher [SYST]; 
Lager, Magazin 


‘instruction: Instruktion [SYST]; Anweisung, Belehrung; Lehre 


operand: Operand (Parameter in einer Instruktion); 
operand specification = Spezifikation fuer einen Operanden 


reference bit: Hinweisbit {5/370} [SYST] 

refer: hinweisen, verweisen, sich beziehen; sich wenden 

associate: in Verbindung stehen; assoziieren, vereinigen 

dynamic addreas translation: dynamische Adressumsetzung (S/370} [SYST] 
change bit: Veraenderungebit [S/370} [SYST] 

dynamic address translation: dynamische Adressumsetzung [8/370] [SYST] 
addresaabie: adressierbar 

storage: Speicher [SYST]; Speicherung; Lagerung: Lagermiete [COM] 


SET STORAGE KEY 
«#0 PEHLT IM WOERTERBUCH (IST ABFRAGE FALSCH BUCHSTABIERT?) ¢** 


contents: Inhalt 
inspect: pruefen, untersuchen, beaufsichtigen 


INSERT STORAGE KEY 
*e4 FEHLT IM WOERTERBUCH (IST ABFRAGE FALSCH BUCHSTABIERT?) *** 


RESET REFERENCE BIT 
*e* FEHLT IM WOERTERBUCH (IST ABFRAGE PALSCH BUCHSTABIERT?) *** 


protection: Schutz, Protektion, Beschuetzung; Schutzzoll [LEG] 
main storage: Hauptspeicher [SYST] 


store: apeichern [SY5T); lagern, aufspeichern; Speicher [SYST]; 
Lager, Magazin 


fetch: Abruf [SYST]; abrufen [SYST]; abholen 


protection: Schutz, Protektion, Beschuetzung; Schutzzoll [LEG] 


Figure 2: Terminological digest of a text portion to be 
translated into German. 
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access-control bit: Zugriffs-Steverungsbit [S/370] 

address: Adresse; Ansprache; adressieren; anreden 

addressable: adressierbar 

allocate: zuordnen, zuteilen, anweisen 

apply: zutreffen; anwenden, verwenden; auftragen [MECH]; bewerben 
associate: in Verbindung stehen; assoziieren, vereinigen 


bit: Bit [SYST], Binaerziffer [SYST]; Bohrspitze [MECH]; Schluesselbart; 
kleines Stueckchen 


block: Block [SYST]; Satzblock [SYST]; blocken [SYST]; blockieren; 
in Bloecke formen - 


change bit: Veraenderungsbit [5/370] [SYST} 

contents: Inhalt 

control field: Kontrollfeld [8/370]; Sortierfeld [SYST] 

dynamic address translation: dynamische Adressumsetzung [5/370] (SYST) 
fetch: Abruf [SYST]; abrufen [SYST]; abholen 

fetch-protection bit: Abrufsschutzbit [S/370] [SYST] 


fetch-type 
#4% FEHLT IM WOERTERBUCH (IST ABFRAGE FALSCH BUCHSTABIERT?} *#* 


indicate: anzeigen, angeben, andeuten 


INSERT STORAGE KEY 
4#**# FEHLT IM WOERTERBUCH (IST ABFRAGE FALSCH BUCHSTABIERT?) *** 


inspect: pruefen, untersuchen, beaufsichtigen 

instruction: Inatruktion [SYST]; Anweisung, Belehrung; Lehre 
key in storage: Speicherschluessel [{S/370} [SYST] 

main storage: Hauptspeicher [SYST] 


match: abgleichen [SYST]; verbinden, paaren, paarweise verbinden; 
passend verbinden [MECH]; Gleiche(r, s), Zusammenbringen 


multiple: Vielfaches; vielfach 


operand: Operand (Parameter in einer Instruktion) ; 
operand specification = Spezifikation fuer einen Operanden 


Protection: Schutz, Protektion, Beschuetzung; Schutzzoll {LEG) 
protection key: Schutzschluessel [8/370] [SYST] 
refer: hinweisen, verweisen, sich beziehen; sich wenden 


reference: Hinweis, Bezugnahme Nachschlagen; mit Verweisungen versehen; 
durch Verweisungen finden 


reference bit: Hinweisbit [S/370] (SYST) 


RESET REFERENCE BIT 
*** FEHLT IM WOERTERBUCH (IST ABFRAGE FALSCH BUCHSTABIERT?) *#* 


SET STORAGE KEY 
*** FEHLT IM WOERTERBUCH (IST ABFRAGE FALSCH BUCHSTABIERT?} *** 


storage: Speicher [SYST]: Speicherung; Lagerung; Lagermiete [COM] 


store: speichern [SYST]; lagern, aufspeichern; Speicher [SYST]; 
Lager, Magazin 


store-type 
*** FEHLT IM WOERTERBUCH (IST ABFRAGE FALSCH BUCHSTABIFRT?) *** 


Figure 3: Alphabeticaliy-sorted terminological digest of 
text portion to be translated into German. 
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Figure 4: Basic types of typewriter and video display terminals. 
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TERMINOLOGICAL DIGEST WITH TERMS IN 


SEQUENCE OF OCCURRENCE IN DOCUMENT PARTS DOCUMENT TO BE TRANSLATED 


PART I 
.-ferm,.... term... term, ... 


TRANSLATOR A 


ALPHABETICALLY - SORTED 
GLOBAL TERMINOLOGICAL DIGEST 


termg term, 
term, term, 
ferme terme, 
termg termg, 


TRANSLATOR B 


TRANSLATOR C 


term; termz, 


TRANSLATOR D 


NOTE: Subscript T denotes the translation of a term, aq s Term is the translation of term, 


Figure 5: Team of translators working cooperatively on a large 
translaticn task, using terminological digests. 
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ON-LINE OPERATING SIMPLICITY 
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Producing a terminological digest by computer online 
should be maximally simple in order to adapt to the needs of 
an inexperienced computer user, such as the translator, 
interpreter, terminologist, lexicographer, editor, or 
typist. Figure 6 illustrates how a user can employ a video 
display terminal to generate a terainological digest. The 
system allows the user various options as to how to proceed, 
prompting him to enter his choice prior to terminological 
digest generation. In Figure 6, after having selected a 
sorted version of the terminological digest, the user has 
begun entering the desired terms, working from a printed 
text document. 


If the source text is available in machine-readable 
format, it can also be automatically retrieved andi displayed 
on the screen. Figure 7 shows how a portion of the English 
source text of a manual, which was originally produced by 
automatic typesetting, is displayed on the video screen 
(first 22 lines of display). The user can move the entire 
text of the manual up or down on the screen, in fact viewing 
the text as through a window, looking for terms whose 
translation he wishes to know. Whenever such a term occurs, 
the user enters it via. the kayboard, at which time it is 
displayed at the left of the bottom line of the screen and 
the term within the text is brightened (i.e., displayed in 
double intensity, verifying to the user that it has been 
selected for terminological digest generation). The user 
can continue entering as many terms as desirei, thereby 
creating the terminological skeleton of the text to be 
translated. Hitting the ENTER key of the keyboard an extra 
time causes production of the terminological digest, which 
can then be displayed (Figure 8A) or printed on an attached 
typewriter (Figure 8B) or a high-speed printer (Figure 8C). 


Of course, the user need not display the text to be 
translated on the video. screen, if he prefers to work from 
its printed version. Moreover, a text may not be available 
in machine-readable format, or the user may employ a 
typewriter terminal where it may be inefficient to print out 
portions of the text intermittently and then enter the terms 
to be Looked up. 


During terminological digest generation the user may 
flip back and forth between his input and the 
already-entered terminaglogical dijest terms, e.g., to scan 
for possible similaritias or redundancies of terms. He may 
also delete terms already selected. 
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Figure 6: Snapshot of a video display screen during on-line 
generation of a terminolegical digest (current system 
environment is indicated to the user in double 
brightness on right-hand bottem line). 
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Figure 7: Display of a text portion of a computer manual with 
term “parameter list" being entered. 
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Figure 8A: Display cf a portion of an English-German terminological 
digest showing terms in order of textual occurrence. 
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edit: zum Druck aufbereiten [SYST], aufbereiten [SYST]; AUFBEREITEN 
ZUM DRUCKEN (Instruktion); redigieren, revidieren, edieren 


omit: weglassen, unterlassen; uebersehen; versaeumen 

parameter list: Parameterliste [SYST] 

positional: stellenbedingt, stellenabhaengig, positionsbedingt 
filetype: Datei-Typ [VM/370] 

filemode: Datei-Modus [VM/370]; Datenbestandsart 


verification: Pruefung [SYST], Bestaetigung [SYST]; Beglaubigung, 
Bescheinigung, Beurkundung 


record length: Satzlaenge [SYST], Datensatzlaenge [SYST]; 
Laenge eines Dokumentes 


pad: auffuellen, polstern; Polster, Kissen; Puffer [MECH] 


Figure 8B: Portion of an English-German terminological digest 
Shewing terms in order of textual occurrence, 
typed out on CMC/ST. 
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edit: zum Druck aufbereiten <SYST>, aufbereiten <SYST>; AUFBEREITEN 
ZUM DRUCKEN (Instruktion) ; redigieren, revidieren, edieren © 


omit: weglassen, unterlassen; ueberseken; versaeumen 

parameter list: Parameterliste <SYST> 

positional: stellenbedingt, stellenabhaengig, positionsbe dingt 
filetype: Datei-Typ <VM/370> 

filemode: Datei~Modus <VM/370>; Datenbestandsart 


verification: Pruefung <SYST>, Bestaetiguag <SYST>; Beglaubigung, 
Bescheinigung, Beurkundung , 


record length: Satzlaenge <SYST>, Datensatzlaenge <SYST>; 
Laenge eines Dokumentes 


pad: auffuellen, pcolstern; Polster, Kissen; Puffer <MECH> 


Figure 8C: Portion cf an English-German terminological digest 
showing terms in order of textual occurrence, 
printed on an IBM 1403 Printer. 
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RETRIEVE SOURCE TEXT 


TERM FOR 


Ls ) 
TERMINOLOGICAL ENTER TERM 
DIGEST? 


YES | "ROLL UP” TEXT 


YES! "ROLL DOWN" TEXT 


LOOK AT ; o 
TEXT PORTION WITH ‘RETRIEVE TEXT 
A SPECIAL WITH SPECIAL TERM 


TERM2 


DELETE 


TERM ALREADY : : 
SELECTED, FROM DELETE TERM 
TERM, 
DIGEST? : 


YES | "TRANSFER" TO TERM 


LIST UP TO THIS 
POINT ? DIGEST LIST 
No NOTES : 
1. Instructions in quotes may be 
EXIT FROM yes | | entered by pressing a program 
Bieri "EXIT function key 
GENERATION? . 
2. On exit, a terminological 
NO digest will be produced if 
at least one term has been 
selected. 


Figure 9: Flow cf control of a terminological digest generation 


process. 
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If he has at his disposal a display terminal with 
progran function keys, which allow communicating 
instructions to the system in a most simple way by merely 
pressing the key, the user may lepress one of six keys for: 


1. Moving text upward on the screen; 
2. Moving text downward on the screen; 
3. Flipping screen "pages"; 


4, Retrieving a screen "page" with a typed term (or 
string of characters) ; 


5. Deleting a tera already entered in the 
terminological digest list; 


6. Transferring to the terminological digest to examine . 
the terms thus far entered, 


Having transferred to the terminological digest 
environment, the program functions are symmetrical, i.e., 
the keys have the same meaning for the terminological digest 
list except that pressing the sixth key will transfer the 
user back to the point in the text where he left off before 
transferring to the terminological digest list. The user 
may also enter any of the six instcuctions on the keyboard 
rather than hitting the appropriate program function key; 
this is mandatory if he uses a keyboard without such keys. 
A typed instruction must be preceded by a > sign to signal 
the system that this is not a tarm to be looked up. Figure 9 
represents the flow of control of a terminological digest 
generation process for a source? text accessible through the 
systen. 


From a human factors engineering point of view, the 
keyboard of the display terminal, which provides the main 
contact of the user with the computer system, basically 
operates like a typewriter but, in several ways, offers 
greatly improved performance for a translator or editor who 
may be a non-typist. The keys react immediately to being 
pressei by the user and so quietly that someone directly 
adjacent to him may not be audibly aware of the operation. 
Visual communication is made easy by the display of 
large-sized characters within sufficient context on the 
screeh. Since typing on the keyboard "prints" characters on 
the screen instead of on paper, error correction is greatly 
improved. By moving the cursor*® to the error (or any 
characters to be changed) and keying in the correct 


* a movable underscore marking the position on the screen 
that the character entered from the keyboard will occupy 
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characters, previous information is overlaid by the new 
data. Changing information in the middle of text can also 
be quickly performed with the aid of the cursor: for 
example, when data are inserted or deleted in the middle of 
text, the immediately following data are automatically 
shiftel forward in the case of insertions, and automatically 
contracted in the case of delstions. Moreover, using the 
BRASE INPUT key causes the information just typed (but not 
yet entered) to be blanked, while using the CLEAR key 
immediately clears the entire display without causing a 
disturbance within the system. 


INTERPRETATION AND ON-LINE GENERATION OF TERMINOLOGICAL DIGESTS 


———K— esse enw ee EEO BOE HEE OOO Oe TO HT Ew OOO TRO TO www Ow eT Owe 


Since, at the userts option, the translation of a 
queried term may be immediately displayed as soon as the 
term is entered, the system could very well be used by 
interpreters during consecutive as well as simultaneous 
interpretation. As described in [47: 154], 


It is generally believei that an interpreter 
cannot consult reference works or colleagues as a 
translator can. This is only partially true. I 
have always found it useful to have pertinent 
dictionaries (whether general or specialized) in 
the booth, which can be consulted by one's 
boothmate or by the interpreter himself. Since an 
unknown or vaguely known word is Likely to crop up 
more than once, it should be looked up in one of 
three ways; (a) by the boothmate immediately after 
it occurs; (b) by the interpreter during his rest 
period; or, if (a) is not possible and recourse to 
(b) would mean risking a second or even third 
encounter with the problematical word, then (c) by 
the interpreter himself while he is still 
interpreting. Boothmates, to90, may be queried 
with an inquisitive look and a note can often be 
slipped to a coworker in a neighboring booth. Such 
consultations should of course never interrupt the 
flow of the interpretation. An interpreter should 
also be alert to difficult words which a boothmate 
or other colleague may have to interpret and 
should therefore not hesitate to slip appropriate 
notes at the proper time. This is often the case 
when one of the interpreters has specialized 
knowledge and can sometines even anticipate the 
appearance of a difficult word or expression with 
amazing accuracy. 


Thus, a display terminal, which has great flexibility 
with cagari to siting and no air-conditioning requirements, 
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could be a valuable aid for an interpreter who has to 
consult special dictionaries extremely rapidly for difficult 
terms encountered during a discourse. Because on-line 
lookup is generally faster than flipping pages [42], such 
queries need not infringe upon the flow of. the interpreter's 
speech. In addition, all terns looked up can be 
Simultaneously saved by the system in a file representing, 
in effect, the terminological "ninutes" of a meeting. 
Therefore, it is conceivable that rapid on-line lookup of 
specialized terminology may enhance the process of 
interpretation and increase the fidelity of the 
interpreter's work. 


Whenever possible, interpreters are urged to collect 
all documentation required for interpretation before every 
conference and scrutinize this material for specialized 
terminology to prepare its translation. Moreover, 


just as the translator, the interpreter should 
build up a glossary of technical terms both for 
his own use and that of his collaagues. These can 
be circulated among the interpreters at the end of 
each session and are of course kept for future 
conferences. Delegates can also usually be 
queried after the session on difficult terms 
(41: 155]. 


Thus, most interpreters and translators have their 
private terminology lists in addition to the official 
reference material. The system and equipment described will 
-permit the user to rapidly input, update and display 
terminology lists and dictionary excerpts via the display 
unit and/or typewriter-terminal. 
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APPENDIX: SORT MECHANISM AND MEMORY LAYOUT OF TERMS 
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This section describes tha essentials of the memory 
layout and access scan of the terms used for terminological 
digest generation, whether performed in order of textual 
occurrence or in alphabetically-sorted order. When the 
terms selected for the terminological digest come into the 
computer, they are available in the original sequence, and, 
after (optional) application of a sort routine, in 
alphabetically-sorted sequence in the same memory area as 
well. They may then be retrieved in either sequence, looked 
up in a dictionary according to the procedure mentioned in 
{33] and outputted in soft or hard copy as a digest for the 
user. 


Pigure 10 represents a section of the memory with teras 
after they have entered the system. The terms, which can be 
of practically unlimited length, are placed adjacent to each 
other in main memory as they come in, according to a 
variable-length storage scheme. Each term is preceded 
initially by a length code and an empty memory cell. If the 
terms are to be sorted alphabetically, the sorting process 
inserts in each such empty cell a pointer (the “chain 
pointer") indicating the position of the next term in 
alphabetical sequence. If a terminological digest is 
requested in textual term order, this pointer is irrelevant: 
The terms are scanned sequentially in memory ‘from left to 
right and the dictionary search is performed on each term. 
If a terminological digest is requested in alphabetical 
order, the terms are first "sorted" by chaining them one by 
one in alphabetical sequence; as each new term is processed, 
the partially formed chain is scanned from the beginning in 
order to determine where the tarm is to be inserted. After 
the sort procedure, the terms are accessed through their 
chain pointers in alphabetical sequence for the dictionary 
search. The flow chart (Figure 11) describes this 
sort/comparison mechanism for variable-length terms. 


The sort process uses, in addition to the pointer 
attached to each term, five "global" pointers to keep track 
of the terms being worked on. These pointers are referred to 
on the flowchart as FRESPNI, SAENTRY, GRVALAD, TEMPS, and 
PREVOLD. 


Looking at aocycle of the sort process, assume a 
Start-address pointer (SAENTRY) points to a partial sorted 
chained seguence and a high-value pointer (GRVALAD) points 
to the latest, i.e., alphabetically highest, term in that 
sequence. The term pointed at by the high-value pointer has 
already been chained to the current term (pointed at by 
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FRESPNT), which is yet to be examined. At this moment it is 
not yet known whether the current term is alphabetically 
higher or lower than the highest-valued term of the chain: 
if it is higher it will remain where it is in the chain; if 
it is lower it will be rechained. The current term is first 
compared against the term pointed at by SAENTRY. If the 
current term is lower, it is attached to the front of the 
chain, so that SAENTRY will point to it from now on. 
Otherwise, the whole chain is scanned beginning with the 
start-address pointer and each of its terms is compared 
against the current term. A next-term pointer (TEMPS) is 
used to scan the chain and a previous-term pointer (PREVOLD) 
trails the next-term pointer by one item, making it possible 
to "insert" the current term into the chain when the next 
term is not lower than the current term. If the "not-lowert 
comparison is not encountered until the end of the chain is 
reached, it is certain to be encountered when eventually the 
current term is compared to itself. The end-of-chain 
condition is detected by the current term having an empty 
chain pointer field. In such a case, the current term 
remains chained where it is, but the high-value pointer is 
advanced to point to the current term. Whatever course the 
comparison has taken, the projran then chains the next 
unexamined term in memory to the last term of the chain, 
which is pointed at by the possibly updated high-value 
pointer.* The program is now ready to repeat the sort cycle. 


A tight comparison mechanism for arbitrarily long 
comparands (terms) is attached to the . sort function, making 
it possible to compare up to 232 characters (although such 
long terms would certainly never be encountered). 


The advantage of employing the above memory layout and 
sort mechanism for .terminological digest generation is 
fourfold: 


1. Terms to .be sorted: need not be Limited or have fixed 


length. 


2. Since the "sorting" is actually done by pointer-updating 
only, the data (terms) to be sorted -are not moved in 
memory. : 


* The advantage of chaining a record to the sorted sequence 
before that record is examined is that the check for the 
end-of-chain condition can be coded outside the inner 
comparison loop, thus making the loop faster. (In addition, 
great efficiency and compactness of the comparison/sort 
routine in machine code is maintained by the use of 
assembler-language programming.) 
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3. The terms remain in their original order in the memory 
area, enabling the user to optionally fetch the terms in 
original or in sorted sequence, by retrieving the terms 
sequentially or by threading through the chain pointers, 
respectively. 


4. If sorting is done in virtual menory, merging phases are 
not required, since merging is replaced by the automatic 
system-paging mechanisna( 43}. 


owren| “ENSTH TERM rowren|“ENSE"| Ten PoInTeR|CENSTH! term, | - + + Iporer|“ENSTH) teem, 


LOCATION O LOCATION | LOCATION 2 LOCATION n 


Figure 10: Conceptual memory section containing terms for a 
terminological digest. 
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SORT/COMPARISON TECHNIQUE 


NOTES: 


t. SAENTRY, GRVALAD, FRESPNT, TEMPS ano PREVOLD 
ARE GLOBAL POINTERS KEEPING TRACK OF THE TERMS 
BEING WORKED ON. 

2. pointer 1S A LOCATION INDICATOR ASSOCIATED WITH term 
(CHAIN POINTER FIELD) 


INITIALIZATION 


END 
OF TERMS 
TO BE SORTED 
aeacree 


YES| SAENTRY POINTS TO FIRST TERM IN THE 
SORTED SEQUENCE TO BE LOOKED UP 
) FOR DIGEST GENERATION 


pointer [FRESPNT]~-SAENTRY 
SAENTRY—~— FRESPNT 


FRESPNT~<— FRESPNT + | 


pointer [GRVALAD]~— FRESPNT 


p-~—-t-— 


|COMPARISON | 
| ROUTINE 


CONCEPTUAL MEMORY LAYOUT 
AT THE START OF SORT 


T, T T. T. T 


2 


Is 
pointer [TEMPS] 
empty ? 


GRVALAD—-FRESPNT 


2 
pointer 
ASSOCIATED |EMPTY|EMPTY |EMPTY|EMPTY|EMPTY|EMPTY 
WITH term 


pointer [FRESPNT]—TEMPS 


pointer [PREVOLD]—— FRESPN 
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NAME BEDRICH CHA LOUPKA 


INSTITUTION XONICS, INC., MCLEAN, VIRGINIA 


Use following space for an abstract or summary of your project 


Title of Project |©MACHINE TRANSLATION ~ 


The Xonics Machine Translation is a fast and efficient system for 
the translation of Russian into English and for translation ‘of other languages 
which have similar grammatical features. This system is representative . 
of the philosophy that effective Machine Translation is one which simulates 
the activities of the human translator. an 


The computer programs which make up the system are written in 
the PL/1 language. They will operate on an IBM 360 or 370 computer. The 
entire system uses less than 100, 000 bytes of computer memory for opera- 
tion. Translation can be done in three different modes: 


(a) Batch ~ for translation of large quantities of text. 

(b) ‘Sentence -by-sentence - for translation of abstracts and short 
articles, Z 

(c) Interactive - for translation utilizing teleprocessing. 


The system contains supporting programs for updating dictionaries. 
Both translation and updating of dictionaries can be done through teleprocessing. 
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NAME John Chandioux 


INSTITU TION TAUM Project, University of Montreal 


Seamer SSO 


Use following space for an abstract or summary of your project 


Title of Project Meteo Weather Forecast Translation 


Meteo is an automatic system for the translation of weather forecasts 
from English into French. Public forecasts for the whole of Canada 
are directly sent to the system via communications network. The 
sentences accepted by the system do not need to be edited or revised. 
The remaining sentences are extracted by an interactive editor and 
displayed on a screen terminal for translation by a human translator. 
Meteo has been operating on an experimental basis 24 hours a day since 
December 1975 in parallel with the translation bureau. It will be fully 
operational by May of 1976 and presently produces 30, 000 words per 
day The actual trahslation time spent is over 1,000 words per minute 
and estimated costs all inclusive are two cents per word. 
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NAME CHANDIOUX John 


INSTITUTION TAUM, Université de Montréal 
a be es Ste 


Use following space for an abstract or summary of your project 


Title of Project Leibnitz, Multilingual system 


Leibnitz is an international cooperation between 
computer transiation centers interested in a multilingual 
System. Several european groups, the TAUM project from the 
Université de Montréal and a Brazilian group are presently wor- 
king on this project. Most parts of the system are being 
written in one of the three languages made available by the 
CETA in Grenoble. The first one is the ATEF language, a string 
tree transducer for dictionary look-up and morphological analy- 
Sis. The second one is CETA and is a tree manipulating language 
for both transfer and generation. The last one iS a tree/string 
transducer to be completed sometime in summer of 76. 

Each group is either working on the design of an 
analyzer or generator for a specific language or on the trans- 
portability of the available formalisms. Research is presently 
under way on French, German, English, Italian, Portugese and 
Russian. English analysis is done by the TAUM team which is 
presently experimenting with a parser written in REZO its own 
version of Wood's Augmented Transition Networks. All particpa- 
ting groups have agreed on a normalized tree representation 
for the output of analyzers and input of generators in order 
to minimize problems in the design of transfer components. The 


first part of the system is expected to be operational within 
two years. 
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NAME Major Lynn M. Hansen 


INSTITUTION Foreign Technology Division (FTD) 


Use following Space for an abstract or summary of your project 


Title of Project =FTD Machine Translation 


FTD has been utilizing machine translation since September 
Of 1963 when the IBM MARK IT system was installed at Wright- 
Patterson Air Force Base. The current FTD machine translation 
system became operational in July 1970. Since that time nearly 
constant improvement has been made through a series of external 
optimization contracts and in-house update efforts, 


graphics merged onto the computer printout; the preliminarily 


has been completely edited with proper syntactical changes and 
then retyped in camera~ready format, 


The bulk of the machine translations at FTD deals with Sat 
subject matter; therefore, FTD glossaries and lexicographic 
routines are basically scientifically oriented. However, the 
System as it now exists will provide quite adequate indicative 
translations of almost any material. 
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NAME Fred C. Hutton 
a i 


INSTITUTION Union Carbide Corp., Nuclear Div., Oak Ridge, TN 
Use following space for an abstract or summary of your project 


Title of Project Georgetown University MT System Usage at Oak Ridge, Tennessee 
mene meee pe nian ch 


Ten years' experience in running the programs on the IBM 7090 jis 
described. The present system, reprogrammed for the IBM 360, is 
described and capabilities of the system are set forth. An example of 
the use of the language invented by A. F. R. Brown (SLC for "Simulated 
Linguistic Computer") used in the preparation of the dictionary and 
linguistic routines, will be presented. 


Approved For Release 2008/03/03 : CIA-RDP80T00294A001200010014-5 


Approved For Release 2008/03/03 : CIA-RDP80T00294A001200010014-5 
| 


{ 


—. G 


NAME ___Erhard O. Lippmann 


INSTITUTION _IBM T. J. Watson Research Center, Yorktown Heights, New York 
eer, Sorktown Heignts, New | 


Use following Space for an abstract or summary of your project 


Title of Project Experimental On-Line Computer Aids for the Human Translator 


An exploratory computer-aided igaviaiation syste is being 
developed which basically consists of storage and retrieval operations 
carried out on line with a computer during the time in which a translation 
is produced. The system is not programmed to simulate the human 
translator by producing automatic translations. Rather, the user can 
call upon the computer's resources as needed in the translation process 
to shorten the delay between the initiation of a translation and its finished 
version. A combination of terminals, computer devices, and software 
is used to perform functions which have habitual human counterparts of 
a mechanical nature, e.¢., dictionary look-up, dictionary updating, 
creation of terminological digests (i.e., test related mini-dictionaries), 
semi-automatic editing generation of cross reference files, text 
statistics, printing and lay out, and automatic combination insertion, 


deletion, or duplication of text. 


Approved For Release 2008/03/03 : CIA-RDP80T00294A001200010014-5 


Approved For Release 2008/03/03 : CIA-RDP80T00294A001200010014-5 


NAME Automated Language Processing Project; Dr. Eldon G. Lytle, Director 


INSTITU TION Brigham Young University, Provo, Utah 


Use following Space for an abstract or summary of your project 
Title of Project Automated Language Processing Project 


The Project emphasizes the refinement of computer-assisted translation, 
as opposed to fully automatic translation, and has devised for this purpose 
techniques of man-machine interaction which utilize the human for those 
aspects of the translation task requiring human intelligence and the computer 
for those aspects of the translation task which can be managed mechanically. 
Junction Grammar, a new theory of language structure which ‘captures ling- 
uistic universals hitherto unknown, serves as the basis for the system, 


Phase I of the development (now operational): provides computer editing, 
file management, and dictionary lookup. Phase II of the development provides 
computerized analysis, transfer, and synthesis of sentence structure (imple- 
mentation 1978-79). Proto-type systems are designed for translation from 
English to Spanish, French, German, and Portuguese, but the method is 

| equally adaptable to any combination of source and target languages. 


The primary sponsor of BYU ALP is the Church of Jesus Christ of Latter- 
day Saints (Mormon), which annually translates approximately 17,000 pages of 
material into more than fifty (50) languages. It is planned that dictionary 
lookup and linguistic processing will initially be accomplished at a large 
central installation. The output of this processing will then be forwarded 
on "floppy" disks to regional translation centers around the world where 


residual aspects of the translation and printing task will be accomplished 
with the aid of mini-computer work stations. 


The Project has a staff of 12 full-time and 18 part-time researchers. 
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NAME . Roger C. Schank 
INSTITU TION Yale University 
Use following Space for an abstract or summary of your project 


Title of Project Computer Understanding of Text 


Research at Yale centers around the building of computer programs that wiJ]l 
understand stories. Two program are currently being developed, SAM and PAM. 


SAM is composed of the following 

1) an analyzer that maps English into a deep conceptual representation. 

2) a script applier that uses its knowledge of contexts to supply missing or 
or implicit inferences about a situation. ; 

3) a memory that finds references for things that it knows about in a text so 

_ as to bring its knowledge to bear on the text. 

4) a generator that reads information provided to it by (1), (2), and (3) and 
States that information in English, Chinese, Russian, Dutch or Spanish. 

5) a question answerer that interacts with the script applier to answer questi 
about an input text. 


ns 


SAM is capable of mechanical translation, automatic summary and paraphrasé 
and question-answering about texts in domains that it has knowledge about. 


PAM is like SAM except that it does not have a script applier but instead 
has a more general mechanism that to infer the goals and intentions 
of the actors in the stories it hears. 


Both of these programs are beginning approaches to the problem of 
computer understanding. 
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NAME Robert J. Shillman, Ph.D. 
re rn renryrrrarereebntsrerencnens 


INSTITUTION Mist. 


: so — 


Use following space for an abstract or summary of your project 


Title of Project Optical Character Recognition Based on Phenomenological Attributes 
ean a ee a ee a 


A theory of character recognition has been proposed and 
a methodology has been developed which is expected to yield 
a machine algorithm that will equal human performance in 
the recognition of isolated, unconstrained, handprinted characters, 
The methodology is based on the study of ambiguous characters, 
characters that can be assigned two letter labels with equal 
probability, rather than on letter ardhetypes. A description 
of the underlying representation of each of the 26 upper case 
letters of the English alphabet was obtained through analysis 
of ambiguous characters which were generated for this purpose. 
The descriptions are in terms of an abstract set of invariants, 
called functional attributes, and their modifiers. The 
relationship between the physical attributes, derived from physical 
measurements upon a character, and the functional attributes 
is given by a set of rules called Physical to Functional Rules, 
Three different techniques for determining these rules through 
psychophysical experimentation have been tested, and the particular 
rule for the attribute LEG has been determined. The remaining 
rules can be obtained in a similar fashion, and the combined 


results are expected to provide the basis for a machine algorithm. 
We are currently investigat ing the Physical to Functional Rules 
for the remaining attributes and are also interested in the 
way in which the rules are to be combined. 
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NAME Robert F. Simmons 


INSTITUTION University of Texas, Austin, Texas 
Use following space for an abstract or summary of your project 


Title of Project TEXT INFORMATION SYSTEMS 


A developmental program is proposed to create a socially useful 


system that will integrate several existing natural language processing 


procedures into a robust, transportable, General Text Understanding 


System for eventual use in applied information centers. The proposal 
is comprised of seven tasks: 1. Continued development of quantified 
case predicate forms of conceptual memory structure. 2. Integration 
of question. answering and problem solving procedures. 3. Development 
of a human-aided, multi-pass, text-to-memory compiler. 4. Generation 
of natural language outputs for summaries, abstracts, expansions, 
translations, etc. 5. Generation of special purpose text teaching 
materials. 6. Implementation of natural language dialogue capabilities. 
7. Development of a textword management system for linguistic 
analysis, retrieval and lexicon asvolopwient. 

The work will be accomplished on a DEC10 to enhance the trans-~ 


portability and communication of documentation for the resulting system. 
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NAME Dr. Peter Toma, President and Chairman of the Board 


INSTITUTION LATSEC, Inc. and World Translation Center, Inc. 
Use following space for an abstract or summary of your project 


Title of Project SYSTRAN 


NN ene seep? Oh ne PS 


After havine developed the SERNA, AUTOTRAN, and TECHNOTRAN 
machine translations, I felt that the advent cf third generation 
computers provided the lcng-awaited opportunity to develop a 
large-scale, yet fast and economical, systematically planned, 
unified, universal system. That system is SYSTRAN, whose name is 
an acronym formed in 1964 from "Systems translation." 

SYSTRAN is a fully operational machine translation system which 
can be installed at any IBM 360/370 site within hours. It has beeyx 
used by the Air Force (translating 15 million words a year) since 
1970 and by NASA since 1973. SYSTRAN is fully automatic, requires 
no human intervention nor pre-editirg. It translates between the 
following language pairs at a sreed of 300,000 words per hour: 
Russian-to-English, English-to-Russian, English-to-French, German- 
to-English, anc Chinese-to-English. We term it a universal 
translation system because of this and because of the ease with 
which new translation capabilities can be added. 

SYSTRAN's success is due to its strong and very flexible soft- 
ware frame, which allows the immediate implementation and testing 
of linguistic hypotheses, as well as universality in handling 
natural languaces. Moreover, its special macro language allows 
linguists to program their own rules. The system can be modified 
Or expanded to any limit at any time. It can never become a 
"black box." : 

The complete SYSTRAN package includes all utility programs, 
dictionary creation and update subsystems, source language analysi 
programs and target language generation programs, as well as pro- 
grams for Gevelopment of frequency listings, concordance materials 
etc. There are separate dictionaries for stem entries and idio- 
Matic expressions (which are also entered in stem form). Because 
lexical items are entered in stem form, and because of a complex 
cross-referencing system, it is necessary to enter any lexeme only 
once, accompanying it with paradigmatic set information. 

Source language analysis programs begin at homograph resolution 
proceeding through establishment of immediate constituents (IC's), 
to establishment of syntactic relationships of IC's and establish- 
ment of clause types and clause boundaries. Semantic analysis is 
used not only in selecting proper target language meaning ecquiva~ 
lents, but also in establishing certain syntactic relationships. 

Target language generation includes structural transformations, 
synthesis of target language word forms including insertion of 
auxiliaries, prepositions, etc., and ener eecten’ within clauses 

chieve standard target language word orcer. ee 
- Rusiher development of pronoun translation and prepositional 
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NAME William S-Y Wang 


INSTITUTION University of California, Berkeley 


Use following space for an abstract or summary of your project 


Title of Project Project on Linguistic Analysis 


sa spenetce 


Research on machine translation from Chinese to English under the 
direction of William S-Y Wang was carried on at the project on 
Linguistic Analysis (University of California, Berkeley) during the 
period 1967 to 1975. During the early part of the effort, System I was 
developed which includes: a) CHIDIC: A Chinese to English machine 
dictionary of about 80, 000 entries (60 percent physics, 30 percent 
biochemistry, and 10 percent general), and b) Monolithic grammar of 
about 4, 000 rules (context-3, phrase-structure rules). In 1973, two 
factors caused redesign of the approach toward ‘the development of 
System II. One, the grammar had become so cumbersome and ad hoc 
that its effectiveness as well as its potential for improvement were 
curtailed. Second, the sponsor requested conversion of the system from 
CDC machines to IBM machines. In response to these factors, System 
II is designed along the lines of "structured programming" (i.e. , itis 
built on self-contained program modules). It is also designed to be 
machine-independent, so that it can be implemented at different computer 
installations. 


Efforts in research and development have been aimed at an operational 
system. We have experimented with numerous trial sentences as well 
as several "live" texts (from articles of 3,000 characters in length) and 
have accumulated machine texts of over 560, 000 characters. System II 
is incomplete, lacking especially the machine-editing of output to 
conform to those morphological features absent in Chinese but required 
in English. 
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NAME Yorick Wilks 


INSTITUTION Dept. of Artificial Intelligence, Univ. of Edinburgh, UK. 


Use following space for an abstract or summary of your project 


Title of Project | An AI Approach to MT 


The present system takes in paragraphs of English on line and out- 
puts paragraphs of French. It is very small with a vocabulary of about 
59-600 word senses, but that is very large for a project of this sort. By 
"this sort''I mean projects that aim for some deep semantic representation 

. of the input language and from which the translation is produced. There is 
no separable syntactic stage in this work; the text is fragmented (into 
clause and phrase length pieces) by the program, and semantic structures 
are attached directly to these. These semantic structures are called 
templates and correspond to "mini-assertions." That is to say, the pro- 
gram seeks to display the input as a sequence of mini assertions. These 
templates are constructed out of formulas, already available from a diction- 
ary, for the word sense of the input. Each word sense has a formula for it, 
and much of the work in the program is ascertaining what is the correct 
word sense (and so correct formula) for an input word. The formulas are 
tree structures built up out of different types of semantic primitive. A 
formula has internal rules operating on these semantic primitives that 
enable it to express the meaning of the corresponding word sense. Once the 
templates have been formed up, various kinds of inference rules operate on 
them, to produce deeper semantic representations, so as to resolve 
remaining ambiguities of word sense, prepositions or pronoun reference. 
When a clear single temple representation has been obtained, a French 
representation can be generated from it. 


Approved For Release 2008/03/03 : CIA-RDP80T00294A001200010014-5 


Approved For Release 2008/03/03 : CIA-RDP80T00294A001200010014-5 


NAME MICHAEL ZARECHNAK 


INSTITUTION Georgetown Univeristy School of Languages and Linguistics 


Use following space for an abstract or summary of your project 


Title of Project Geergetewn University General Analysis Techniques-(GAT)} 


Simulated Lingyi _Cenp 


The Georgetown University Russian-English System is running on IBM 
360/70 .CPU time for 2000 words @ 9 seconds. The texts translated 
include scientific,technological,and economic materialse 

MeZarechnak in close cooperation with the linguistic research staff. 
The linguistic statements are coded in symbolic language designed by 
Dre A.Brown ('SLC'=Programming Language). Input/output is in Assembler 
language. : " 

A dictionary entry contains a split or unsplit Russian stem, grammati- 
cal coding, lexical number, and English part. The clustered entries 
are recognized through special local operations when the calling signalg 
occur within the sentence under processinge 
Syntactic analysis is partly based on morphosyntactic markings and 
partly on semantic coding. y : 
Users: Primarily scientists at ORNL. Users’ comments essentially favoe 
rable. 

The unddited translation is used primarily for information purposes, 
although in a few instances, the translations were. post-edited when 

the user requested it. : 

The quality of the present translation is the same as it:was in 1964. 
No linguistic improvements were inserted in the system although there 
are: some linguistic programs ready to be inserted. 

The semantic level will be added. Its underlying proceduresg:are based 
on the semantic collocational and colligational distributional patterns 
as observed in the real corpora, with such generalization as these cor- 
pora would suggest. It is hoped that after large corpora will be des- 
cribed both semantically and analytically, then some theories might be 
developed and tested deductively for the improvement of the next MT 
cycle. Each sentence is scanned from the left to the right, and from 
right to left at least forty times, following a path of certain prio~ 
rity-based strategied. All these scannings in both directions are 
grouped into four levels: word recognition, syntagmatic,syntactic, and 
Synthesis of English. Some parts of the synthesis are independent of 
the Russian input. 

Size of the dictionary: @50,000 stems. 


Sn ee remnt 
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NAME S. C. Loh 
; 


INSTITU TION Chinese University of Hong Kong 


Use following space for an abstract or summary of your project 


Title of Project Chinese University Language Translator (CULT) 


The Chinese Univer sity Language Translator (CULT) isa 


Chinese-English computer~translating system which is unique in that 


it utilizes pre-editing of the source text as opposed to post-editing of 


the target text. The system is essentially a "pragmatic" one, in that 


the rules for handling complex strings previously requiring pre- editing 


are introduced as needed. CULT is made up of four modules: Dictionary 


look-ups, Syntactic Analyzer, Semantic Analyzer, andOutput. Among 


these, the most limited is the Semantic Analyzer, which seems to rely 


more heavily on pre-editing than the other modules, 


CULT is currently being used to translate two Mainland scientific 


journals, ACTA Mathematica Sinica and ACTA Physica Sinica. The 


computer, however, is not a person, which means it cannot experience 


the feeling of the original text and is not of much use for translating 


_  -€s34g S 
literary material. Neverthéless, scientific and some non-scientific 


works are also within the scope of its capabilities, 
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NAME Jim Mathias - 


INSTITUTION _CETA (U.S.A. ) sede ees 
Use following. space for an abstract or summary of your project 


Title of Project Chinese-English On-Line Retrieval . 


The CETA (Chinese-English Translation Assistance) group is building 
a machine reacable dictionary file for use in on-line retrieval and for 
development of dictionaries and indexes for use of human translators. 
The experimental on-line retrieval system can store an unlimited number 
of entries. The current file of 640,000 machine readable entries is 
divided into approximately 110,000 general eutyias » 10,000 colloquial 
entries, and 500,000 scientific and technical Chinese-English entries, 
The experimental system designed for an IBM 360 illustrates the facility 
of computer storage, retrieval, and display of Chinese characters and 
Roman alphabet as well as other scripts. It also illustrates the facility 
of computer techniques for indexing Chinese characters and special 


adaptability for synthesizing Chinese queries to search telecode sorted files. 
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NAME Friedrich Krollmann 
a a ee 


INSTITUTION Federal Bureau of Languages 


Use following space for an abstract or summary of your project 


Title of Project: FRG Translation Aid System 


Germany's Federal Bureau Computer Translation Aids System 
contains over 700, 000 foreign language (English, French, Russian, 
and Portuguese)~German entries of a technical and scientific nature. 
These siries can be accessed in a number of different ways depending 
on the needs of the user, Thus, the programming of the system allows 


for more specialized foreign language-German glossaries and lexical 


concordances, as well as linguistic analysis and frequency counts on 


the technical vocabulary of a given language, 
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